---
title: Mojo vision
sidebar_label: Vision
description: Our motivations and the design decisions that define the Mojo programming language
---

Our vision for Mojo is to be the one programming language developers need to
target diverse hardware—CPUs, GPUs, and other accelerators—using Python's
intuitive syntax combined with modern systems programming capabilities.

Although this vision focuses on the Mojo language, we recognize it's just one
part of a larger Mojo ecosystem. When combined, the developer tools, the
community, and the landscape of Mojo libraries are arguably more important at
scale. However, the purpose of this document is to share more about our motives
and aspirations for the language itself, because it supports everything else.

This vision serves as a baseline to guide our decision-making as the language
continues to evolve. This is a "directional" vision, not an engineering plan.
For a look at some of the planned work, see the [Mojo roadmap](/mojo/roadmap/).

## Mojo's role in Modular's mission

Mojo plays a key role in Modular's mission to [democratize AI
compute](https://www.modular.com/democratizing-ai-compute). Let's break down
the mission into its component parts:

- **Democratize**: This is a social statement, saying that we want to free,
unlock, and enable more people to participate.

- **AI compute**: We have long passed the end of Moore's law, and are awash
with a wide range of accelerators: GPUs, TPUs, and accelerated CPUs, spanning
IoT, edge, client, datacenter, and supercomputer applications. (Our ambition is
to eventually expand into "All Compute," but there are a few steps between here
and there.)

Mojo is how we bring these two ideas together—democratization and AI
compute—into a single, coherent solution. To achieve this we want to:

- **Unite developers** across domains, skill levels, and backgrounds
(enterprise engineers, academics, hobbyists, etc.). We aim to solve the
complexity of juggling Python, C++, Rust, CUDA, and more (the "N language
problem") by enabling developers to grow their skill-sets incrementally within
a single language.

- **Unify hardware** by giving developers access to a wide range of
hardware—CPUs, GPUs, TPUs, and emerging accelerators—with consistent tools and
programming models.

It should be easy to start using Mojo as a Python developer, and incrementally
adopt new Mojo features to master CPU performance and scale your abilities into
GPU programming and other accelerated hardware.

This mission is vast and ambitious. Many have attempted to solve this and have
fallen short. Achieving it will take years of focused development, but we feel
it is worth doing. We believe Mojo can help unlock creativity, productivity,
and applications we haven't yet imagined.

## Why Mojo was built from scratch

Modern accelerators are complex and very different from traditional CPUs. They
have features like tensor cores, systolic arrays, dedicated convolutional
units, explicit memory hierarchies, memory transfer accelerators (such as the
Tensor Memory Accelerator on Hopper and Blackwell GPUs), and a variety of
exotic and rapidly-evolving data types like float6. Achieving our mission to
unify hardware development means Mojo must provide full programmability and
deliver the full performance potential of any given accelerator chip.

There are only three ways to tackle the problems we are trying to solve. Let's
briefly evaluate the pros and cons of each:

1. **Extend an existing language like C++, Rust, Julia, Swift**:

    - **Pro**: You get an existing implementation and community.

    - **Con**: None of these languages support the hardware features we need—they
      were designed for CPUs. They are also all 10+ years old, don't
      provide the modern meta-programming features we need, and weren't designed to
      support hardware features required for AI (such as float6).

2. **Create an embedded DSL for a language like Python or C++**:

    - **Pro**: This is comparatively easy to implement.

    - **Con**: The tooling, UX, and predictability of these systems are very
      problematic and they are limited by the base language syntax. This is
      particularly problematic if you're trying to introduce fundamental new
      concepts because you can't change the grammar of Python or C++.
      [More about eDSLs](https://www.modular.com/blog/democratizing-ai-compute-part-7-what-about-triton-and-python-edsls)

3. **Build an entirely new programming language from scratch**:

    - **Pro**: You get full control to create the best quality result.

    - **Con**: This is extremely expensive and difficult to do. There are many
      ways to get this wrong and you must have a strong set of principles to
      guide development. For comparison, CUDA is a C++ extension and
      runtime—nothing as ambitious as a new programming language.

We ruled out the first two options because they're insufficient for achieving
the full scope of our vision. Failure to understand the constraints of each
approach and denial about the boundaries within them is a core reason many
previous systems had a promising start but ultimately hit a ceiling for their
generality and usability that prevents them from fully democratizing AI compute.

We believe GPUs, TPUs, and other accelerators are the natural evolution of
compute going forward and demand high-quality software to achieve their full
potential. Therefore, we believe it's worthwhile to bet big, rather than do
something easier that might get near-term results but wither away over time as
AI and accelerator hardware continues to rapidly evolve.

## Overarching design principles

Because Mojo will evolve over time, it's essential to prioritize
deliberately—staying focused on our long-term goals while making pragmatic
short-term decisions. The following are the high-level design principles that
guide Mojo's development.

### Member of the Python family

Mojo adopts Python's syntax and should feel familiar to Python
developers—Python is not only [one of the most popular programming languages in
the world](https://www.tiobe.com/tiobe-index/), but it's also the dominant
language in AI. Python is beloved for its clean and readable syntax, small core
language (compared to many alternatives), powerful metaprogramming, and its
role as a "universal superglue" for integrating complex systems across language
boundaries.

That's why Mojo supports the core features Python programmers instinctively
reach for—`if`/`for` statements, lists, dictionaries, etc.—so it's easy to
migrate code. Mojo will support more Python features over time, but our primary
focus is on building features that unlock high-performance, portable
compute—not on quickly achieving surface-level Python compatibility.

### Scalable AI kernel development

A key principle for Mojo is to overcome the fundamental scalability limitations
that plague traditional kernel libraries and ML compilers, and become a unified
language for kernel development.

Kernel libraries, while initially useful, become [hard to manage as systems
grow](https://www.modular.com/blog/democratizing-ai-compute-part-5-what-about-cuda-c-alternatives).
ML compilers, despite their sophistication, often [lack the generality needed
for diverse
tasks](https://www.modular.com/blog/democratizing-ai-compute-part-6-what-about-ai-compilers)
like data loading, pre-processing, dynamic shapes, and sparsity—they failed to
provide an "it just works" experience. Even other [MLIR-based compiler
systems](https://www.modular.com/blog/democratizing-ai-compute-part-8-what-about-the-mlir-compiler-infrastructure)
failed to solve this due to a fragmented development process that couldn't
scale to handle the constantly changing requirements in numerics, data types,
AI modeling, and hardware.

Thus, while building our inference engine for the Modular Platform, we wanted a
new way to write kernels that could scale with the ever-evolving AI industry.
We took inspiration from kernel programming systems (CUDA, CUTLASS, DSLs, etc),
and built a way to express common kernel development patterns in MLIR. Then we
took a step further and generalized those patterns into a new language that's
suitable for high-performance kernel development. For example, Mojo includes
zero-cost abstractions, knobs that can be tuned for optimal hardware
performance, a library-first design, and metaprogramming to allow
specialization for particular hardware.

### A modern systems programming language

While Mojo builds on Python's syntax, it must address the realities of modern
accelerators, which are essentially high-performance embedded systems. For
example, you don't want to upload megabytes of code just to run a matrix
multiplication, and you can't afford implicit performance overhead in inner
loops. These requirements drive Mojo to go beyond Python with capabilities
designed for low-level numerical and hardware-focused programming.

Mojo introduces systems programming constructs such as a static type system
(dynamic typing will come later), memory management control, and predictable
performance semantics. It draws on lessons from modern languages like Swift,
C++, Rust, and Zig—and goes beyond them by embracing new ideas that target the
breadth of exotic hardware AI developers now face.

For more details, see
the [architectural bets for Mojo](#architectural-bets-for-mojo) below.

### Managing language complexity

The complexity of some systems programming languages (notably C++) has spiraled
out of control by continually adding new features that don't quite fit
together. This happens due to a ["tragedy of the
commons"](https://en.wikipedia.org/wiki/Tragedy_of_the_commons) situation where
every individual language feature is justified by some specific cohort or
use-case, but all users of the language suffer from the aggregate complexity.

Python isn't perfect, but it has retained a relative simplicity—notably
evolving from Python 2 to 3 with care to improve its consistency and
orthogonality. Other programming languages like [Go](https://go.dev/) pride
themselves on maintaining simplicity and saying "no" to proposals that don't
benefit long-term goals ([example blog post explaining
this](https://blog.kowalczyk.info/article/d-2025-06-26/go-is-8020-language.html)).

Mojo must go beyond Python by adding new systems programming features aligned
with our mission—but in doing so, we face the same scope-creep pressures that
every growing language confronts.

We aim to control complexity through a few specific strategies:

1. **Use Mojo heavily inside Modular:** Modular is Mojo's largest user and
maintains the world's largest Mojo codebase (which is open source). This gives
us direct insight into real-world usability and performance. We use our own
experience, as well as feedback from our enthusiastic community, to guide
prioritization.

2. **Align with Python wherever possible:** If Python already supports a
feature, we adopt its design rather than inventing something new. Any deviation
from Python requires a strong, mission-driven justification.

3. **We adopt proven ideas from modern languages:** When new features are
required (such as static types, traits, metaprogramming), we draw from languages
like Rust, Swift, and Zig rather than create novel and untested solutions.

4. **Innovate only when necessary:** Where existing designs fall short—such
as ergonomics in Rust or compile-time error messages in Zig—we aim beyond them
to meet Mojo's goals.

5. **Emphasize composability and simplicity:** Every Mojo feature must work
reliably in all situations and combine seamlessly with other features (compose
orthogonally). We're not satisfied with features that work 80% of the time but
fail in edge cases.

6. **Defer syntactic sugar:** Language sugar is often tempting, but we
prioritize core "big rocks" first. Only once the fundamentals are solid do we
revisit syntactic enhancements.

These are guiding principles, not a rigid recipe—language design is
fundamentally about balancing tradeoffs. The Mojo team draws on deep
experience, learns through continuous implementation and iteration, and listens
to feedback from the broader community.

## Architectural bets for Mojo

We believe building a new programming model for the future of AI and systems
programming requires first-principles thinking, not incremental evolution.
That's why our vision for Mojo is built upon a few specific architectural bets,
as described in this section.

From the start, our team made a foundational bet: By uniting three key
technologies, we can build a new kind of systems programming architecture that
scales across the full range of hardware targets while maximizing software
reuse across heterogeneous hardware.

Thus, Mojo's architecture is built upon the following technologies:

1. Powerful parametric meta-programming
2. MLIR Core
3. MAX framework integration

This design is built on experience, not speculation. In 2022, we spent most of
the year prototyping and validating this approach through deep compiler R&D.
Eventually, we had the architectural conviction that the idea was viable, but
only if we could improve and scale it. The next step was to make this power
accessible to developers, not just compiler engineers. That's what led us to
create Mojo.

Let's explore each of these architectural pillars in more detail.

### Powerful parametric meta-programming

Accelerators are incredibly diverse, and they're constantly evolving. Our
goal is to drastically reduce the time and effort required to bring up a
software stack for a new chip. We believe the work should be proportional
to *how different* that chip is, rather than starting from scratch for every
architecture.

The core insight behind our approach is this: while no two accelerators are
exactly alike, their target workloads and macro-architectures share deep
structural similarities. For example, NVIDIA's Hopper architecture extends from
Ampere. AMD's MI300 has meaningful overlap with both. And across the industry,
the "tensor core" has become ubiquitous, showing up in CPUs, GPUs, and custom
ASICs. These units may be quirky in their own ways, but their purpose is the
same: efficiently run matrix multiplications.

Previous attempts to democratize AI compute often failed to capitalize on this
commonality. Many were built on fragmented, vendor-specific libraries like
cuBLAS or rocBLAS, which prevented true cross-architecture unification. At
Modular, we made a different bet: we could reimplement and unify these software
stacks ourselves—for example, build a graph compiler and runtime stack (MAX)
*without* CUDA—and use that as a basis to abstract across architectures.

Of course, this only works if it scales. The challenge is combinatorial: the
cross-product of all data types, operators, and hardware targets is too large
to implement by hand. That's why we leaned into powerful meta-programming. We
took the ideas behind C++ templates (compile-time polymorphism and
specialization) and built something dramatically more usable, with better error
messages, faster compile times, more expressiveness, and a smoother
developer experience.

In late 2022, we validated this approach with an early prototype of the Mojo
parameter system. Even in its primitive form, it enabled us to implement matrix
multiplication in a unified way and match or [exceed vendor BLAS libraries
across a range of
CPUs](https://www.modular.com/blog/the-worlds-fastest-unified-matrix-multiplication).
This architectural bet has paid off many times over—it's what allows MAX to
scale across hardware with high performance and maintainable code.

### MLIR Core

MLIR is a widely used compiler infrastructure for building domain-specific
compilers—powering systems across AI accelerators, CPUs, hardware design,
quantum computing, and more. Within it, you can think of *MLIR Core* as a
flexible "compiler construction toolkit," providing the building blocks needed
to create powerful custom compilers.

:::note

The broader MLIR project includes many AI-related dialects, such as `linalg`,
`affine`, and `scf`, but Mojo doesn't use any of these—Mojo is built purely on
top of MLIR Core.

:::

Mojo is powered by a novel compiler framework, historically code-named **KGEN**
(for "kernel generator"). KGEN is built using MLIR Core and forms the
backbone of Mojo's metaprogramming capabilities. It allows explicitly
parametric code to be represented *before* instantiation, which enables a host
of benefits: faster compile times, clearer error messages, and support for
compiling the same source code to multiple target devices.

Another key design choice in Mojo is that it acts as syntactic sugar for MLIR.
This means Mojo code can directly express MLIR dialect operations, without
modifying the Mojo compiler itself. While not all MLIR dialects are supported,
Mojo is designed to cover the most important ones needed for accelerator
programming. Broader dialect support is possible in the future, but it's not a
near-term priority.

For a deep dive into how Mojo uses MLIR and KGEN, see the video,
[Modular Tech Talk: Kernel Programming and
Mojo](https://www.youtube.com/watch?v=Invd_dxC2RU).

### MAX framework integration

Mojo's low-level programming model and MLIR-based foundation make it easy to
write high-performance code, but raw kernel performance isn't the whole story.
In AI and other advanced domains, some of the biggest gains come from
graph-level optimizations like *kernel fusion*.

That's why Mojo is designed to integrate seamlessly into the **MAX
framework**—our graph compiler and runtime stack. This integration allows
developers to directly extend MAX using Mojo (for example, write custom graph
ops)—without modifying the graph compiler itself—and still benefit from
advanced optimizations and code transformations. This is made possible by Mojo
building an MLIR representation of the kernel code before instantiating it with
parameters. This intermediate representation (IR) allows the graph compiler to
reflect over the kernel to understand the inputs, outputs, as well as
transforming the IR of custom kernels directly.

As Mojo evolves, a key goal remains enabling and enriching the MAX framework.
We want to unlock new forms of optimization and fusion that only become
possible when you reason across combinations of kernels—not just individual
operators. These kinds of transformations can dramatically improve performance,
reduce memory usage, and lower the cost of deploying high-performance AI
systems at scale.

## Looking ahead

Language design is expensive. It's difficult and ambiguous. But when done
right, it creates leverage that compounds over time, enabling not just
performance, but creativity, composability, and community growth.

Although Mojo is still early in its journey, it stands on a carefully
engineered foundation—one designed to scale across devices, abstractions, and
time. These investments are already paying off, and we believe they position
Mojo to grow into a truly transformative technology for the AI era and beyond.

We're building it together, and we're **building it to last**. We want to
eventually create a vibrant ecosystem where people use Mojo to build and share
a wide range of AI applications, large scale distributed systems, database
connectors, and much more—putting the power of the world's compute hardware at
your fingertips. That's when we'll feel like Mojo is truly on fire 🔥!

For more detail about what's left to do, see the [Mojo roadmap](/mojo/roadmap).
